skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Dickinson, Hugh"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Fortson, Lucy; Crowston, Kevin; Kloetzer, Laure; Ponti, Marisa (Ed.)
    In the era of rapidly growing astronomical data, the gap between data collection and analysis is a significant barrier, especially for teams searching for rare scientific objects. Although machine learning (ML) can quickly parse large data sets, it struggles to robustly identify scientifically interesting objects, a task at which humans excel. Human-in-the-loop (HITL) strategies that combine the strengths of citizen science (CS) and ML offer a promising solution, but first, we need to better understand the relationship between human- and machine-identified samples. In this work, we present a case study from the Galaxy Zoo: Weird & Wonderful project, where volunteers inspected ~200,000 astronomical images—processed by an ML-based anomaly detection model—to identify those with unusual or interesting characteristics. Volunteer-selected images with common astrophysical characteristics had higher consensus, while rarer or more complex ones had lower consensus. This suggests low-consensus choices shouldn’t be dismissed in further explorations. Additionally, volunteers were better at filtering out uninteresting anomalies, such as image artifacts, which the machine struggled with. We also found that a higher ML-generated anomaly score that indicates images’ low-level feature anomalousness was a better predictor of the volunteers’ consensus choice. Combining a locus of high volunteer-consensus images within the ML learnt feature space and anomaly score, we demonstrated a decision boundary that can effectively isolate images with unusual and potentially scientifically interesting characteristics. Using this case study, we lay important guidelines for future research studies looking to adapt and operationalize human-machine collaborative frameworks for efficient anomaly detection in big data. 
    more » « less
    Free, publicly-accessible full text available December 9, 2025
  2. Abstract We study the evolution of the bar fraction in disk galaxies between 0.5 < z < 4.0 using multiband colored images from JWST Cosmic Evolution Early Release Science Survey (CEERS). These images were classified by citizen scientists in a new phase of the Galaxy Zoo (GZ) project called GZ CEERS. Citizen scientists were asked whether a strong or weak bar was visible in the host galaxy. After considering multiple corrections for observational biases, we find that the bar fraction decreases with redshift in our volume-limited sample (n= 398); from 2 5 4 + 6 % at 0.5 <z< 1.0 to 3 1 + 6 % at 3.0 < z < 4.0. However, we argue it is appropriate to interpret these fractions as lower limits. Disentangling real changes in the bar fraction from detection biases remains challenging. Nevertheless, we find a significant number of bars up toz= 2.5. This implies that disks are dynamically cool or baryon dominated, enabling them to host bars. This also suggests that bar-driven secular evolution likely plays an important role at higher redshifts. When we distinguish between strong and weak bars, we find that the weak bar fraction decreases with increasing redshift. In contrast, the strong bar fraction is constant between 0.5 <z< 2.5. This implies that the strong bars found in this work are robust long-lived structures, unless the rate of bar destruction is similar to the rate of bar formation. Finally, our results are consistent with disk instabilities being the dominant mode of bar formation at lower redshifts, while bar formation through interactions and mergers is more common at higher redshifts. 
    more » « less
    Free, publicly-accessible full text available June 30, 2026
  3. Abstract Giant star-forming clumps (GSFCs) are areas of intensive star-formation that are commonly observed in high-redshift (z ≳ 1) galaxies but their formation and role in galaxy evolution remain unclear. Observations of low-redshift clumpy galaxy analogues are rare but the availability of wide-field galaxy survey data makes the detection of large clumpy galaxy samples much more feasible. Deep Learning (DL), and in particular Convolutional Neural Networks (CNNs), have been successfully applied to image classification tasks in astrophysical data analysis. However, one application of DL that remains relatively unexplored is that of automatically identifying and localizing specific objects or features in astrophysical imaging data. In this paper, we demonstrate the use of DL-based object detection models to localize GSFCs in astrophysical imaging data. We apply the Faster Region-based Convolutional Neural Network object detection framework (FRCNN) to identify GSFCs in low-redshift (z ≲ 0.3) galaxies. Unlike other studies, we train different FRCNN models on observational data that was collected by the Sloan Digital Sky Survey and labelled by volunteers from the citizen science project ‘Galaxy Zoo: Clump Scout’. The FRCNN model relies on a CNN component as a ‘backbone’ feature extractor. We show that CNNs, that have been pre-trained for image classification using astrophysical images, outperform those that have been pre-trained on terrestrial images. In particular, we compare a domain-specific CNN – ‘Zoobot’ – with a generic classification backbone and find that Zoobot achieves higher detection performance. Our final model is capable of producing GSFC detections with a completeness and purity of ≥0.8 while only being trained on ∼5000 galaxy images. 
    more » « less
  4. ABSTRACT We present detailed morphology measurements for 8.67 million galaxies in the DESI Legacy Imaging Surveys (DECaLS, MzLS, and BASS, plus DES). These are automated measurements made by deep learning models trained on Galaxy Zoo volunteer votes. Our models typically predict the fraction of volunteers selecting each answer to within 5–10 per cent for every answer to every GZ question. The models are trained on newly collected votes for DESI-LS DR8 images as well as historical votes from GZ DECaLS. We also release the newly collected votes. Extending our morphology measurements outside of the previously released DECaLS/SDSS intersection increases our sky coverage by a factor of 4 (5000–19 000 deg2) and allows for full overlap with complementary surveys including ALFALFA and MaNGA. 
    more » « less
  5. ABSTRACT Astronomers have typically set out to solve supervised machine learning problems by creating their own representations from scratch. We show that deep learning models trained to answer every Galaxy Zoo DECaLS question learn meaningful semantic representations of galaxies that are useful for new tasks on which the models were never trained. We exploit these representations to outperform several recent approaches at practical tasks crucial for investigating large galaxy samples. The first task is identifying galaxies of similar morphology to a query galaxy. Given a single galaxy assigned a free text tag by humans (e.g. ‘#diffuse’), we can find galaxies matching that tag for most tags. The second task is identifying the most interesting anomalies to a particular researcher. Our approach is 100 per cent accurate at identifying the most interesting 100 anomalies (as judged by Galaxy Zoo 2 volunteers). The third task is adapting a model to solve a new task using only a small number of newly labelled galaxies. Models fine-tuned from our representation are better able to identify ring galaxies than models fine-tuned from terrestrial images (ImageNet) or trained from scratch. We solve each task with very few new labels; either one (for the similarity search) or several hundred (for anomaly detection or fine-tuning). This challenges the longstanding view that deep supervised methods require new large labelled data sets for practical use in astronomy. To help the community benefit from our pretrained models, we release our fine-tuning code zoobot. Zoobot is accessible to researchers with no prior experience in deep learning. 
    more » « less
  6. ABSTRACT Galaxy Zoo: Clump Scout  is a web-based citizen science project designed to identify and spatially locate giant star forming clumps in galaxies that were imaged by the Sloan Digital Sky Survey Legacy Survey. We present a statistically driven software framework that is designed to aggregate two-dimensional annotations of clump locations provided by multiple independent Galaxy Zoo: Clump Scout volunteers and generate a consensus label that identifies the locations of probable clumps within each galaxy. The statistical model our framework is based on allows us to assign false-positive probabilities to each of the clumps we identify, to estimate the skill levels of each of the volunteers who contribute to Galaxy Zoo: Clump Scout and also to quantitatively assess the reliability of the consensus labels that are derived for each subject. We apply our framework to a data set containing 3561 454 two-dimensional points, which constitute 1739 259 annotations of 85 286 distinct subjects provided by 20 999 volunteers. Using this data set, we identify 128 100 potential clumps distributed among 44 126 galaxies. This data set can be used to study the prevalence and demographics of giant star forming clumps in low-redshift galaxies. The code for our aggregation software framework is publicly available at: https://github.com/ou-astrophysics/BoxAggregator 
    more » « less
  7. Abstract Giant, star-forming clumps are a common feature prevalent among high-redshift star-forming galaxies and play a critical role in shaping their chaotic morphologies and yet, their nature and role in galaxy evolution remains to be fully understood. A majority of the effort to study clumps has been focused at high redshifts, and local clump studies have often suffered from small sample sizes. In this work, we present an analysis of clump properties in the local universe, and for the first time, performed with a statistically significant sample. With the help of the citizen science-powered Galaxy Zoo: Hubble project, we select a sample of 92 z < 0.06 clumpy galaxies in Sloan Digital Sky Survey Stripe 82 galaxies. Within this sample, we identify 543 clumps using a contrast-based image analysis algorithm and perform photometry as well as estimate their stellar population properties. The overall properties of our z < 0.06 clump sample are comparable to the high-redshift clumps. However, contrary to the high-redshift studies, we find no evidence of a gradient in clump ages or masses as a function of their galactocentric distances. Our results challenge the inward migration scenario for clump evolution for the local universe, potentially suggesting a larger contribution of ex situ clumps and/or longer clump migration timescales. 
    more » « less
  8. ABSTRACT We use Bayesian convolutional neural networks and a novel generative model of Galaxy Zoo volunteer responses to infer posteriors for the visual morphology of galaxies. Bayesian CNN can learn from galaxy images with uncertain labels and then, for previously unlabelled galaxies, predict the probability of each possible label. Our posteriors are well-calibrated (e.g. for predicting bars, we achieve coverage errors of 11.8 per cent within a vote fraction deviation of 0.2) and hence are reliable for practical use. Further, using our posteriors, we apply the active learning strategy BALD to request volunteer responses for the subset of galaxies which, if labelled, would be most informative for training our network. We show that training our Bayesian CNNs using active learning requires up to 35–60 per cent fewer labelled galaxies, depending on the morphological feature being classified. By combining human and machine intelligence, Galaxy zoo will be able to classify surveys of any conceivable scale on a time-scale of weeks, providing massive and detailed morphology catalogues to support research into galaxy evolution. 
    more » « less